Skip to content

add nvhpc to test suite github runners#1317

Draft
sbryngelson wants to merge 11 commits intoMFlowCode:masterfrom
sbryngelson:nvhpc
Draft

add nvhpc to test suite github runners#1317
sbryngelson wants to merge 11 commits intoMFlowCode:masterfrom
sbryngelson:nvhpc

Conversation

@sbryngelson
Copy link
Copy Markdown
Member

Adds github runners for nvhpc builds on cpu and gpu

@github-actions
Copy link
Copy Markdown

Claude Code Review

Head SHA: 5dec31f
Files changed: 1

  • .github/workflows/test.yml

Summary

  • Adds 15 new matrix entries to the github job covering 5 NVHPC versions (23.11, 24.5, 24.11, 25.1, 25.3) × 3 targets (cpu, gpu-acc, gpu-omp).
  • NVHPC builds run inside the official nvcr.io/nvidia/nvhpc container on ubuntu-22.04 runners with nvfortran/nvc/nvc++ compilers and root-enabled OpenMPI.
  • GPU targets (gpu-acc, gpu-omp) exercise a build-only dry-run (--dry-run); actual test execution is restricted to cpu targets, which is appropriate since github-hosted runners lack GPU hardware.
  • All NVHPC jobs set continue-on-error: true (inherited from the job level), so failures are informational and do not block merges — matching the intended non-gating posture stated in the PR description.
  • Existing non-NVHPC steps are correctly guarded with !matrix.nvhpc conditions, and Intel/macOS setup steps are unaffected.

Findings

Observation (non-blocking): Missing MPI installation for NVHPC container
File: .github/workflows/test.yml, "Setup NVHPC" step (~line 178)

The NVHPC container image (nvcr.io/nvidia/nvhpc:*-devel-cuda_multi-ubuntu22.04) ships with NVHPC and CUDA toolchains but does not include a system OpenMPI installation. The CPU test step runs mfc.sh test ... --test-all without a --no-mpi flag, which will invoke MPI. The environment sets OMPI_ALLOW_RUN_AS_ROOT (suggesting MPI is expected to be used), but libopenmpi-dev / openmpi-bin are absent from the apt-get install list in the NVHPC setup step. This may cause CPU test failures at runtime if MFC's test runner tries to launch mpirun. Consider either adding openmpi-bin libopenmpi-dev to the NVHPC apt-get install line, or passing --no-mpi to the NVHPC Test step.

Minor style note: HDF5 not installed for NVHPC
File: .github/workflows/test.yml, "Setup NVHPC" step (~line 178)

The standard Ubuntu setup installs libhdf5-dev and hdf5-tools, but the NVHPC setup omits them. If any test case exercises HDF5 I/O (post_process output), the test run may fail. The standard non-NVHPC apt-get install list could serve as a reference for parity.

No issues found in the conditional logic, container image references, GPU flag mapping (gpu-acc--gpu acc, gpu-omp--gpu mp), or the CC/CXX/FC overrides. The ternary expression for runs-on and container is correct.

@github-actions
Copy link
Copy Markdown

Claude Code Review

Incremental review from: 5dec31f20ec2b50075f91eecae9f1caa98db638d
Head SHA: cd0d691df73cfa4e1528ffc02a3f8310c5a4cde9

New findings since last Claude review:

The two changes in this update are:

  1. nvhpc: [""] and target: [""] added to the base matrix so non-NVHPC entries have these keys defined with empty string values.
  2. The non-NVHPC job name updated from the bare string "Github" to format("Github ({0}, {1}, {2}, intel={3})", matrix.os, matrix.mpi, matrix.debug, matrix.intel).

Both changes are correctness improvements — no new issues introduced.

Observation (carry-over, still unaddressed): MPI and HDF5 missing in NVHPC container setup

The Setup NVHPC step still does not install openmpi-bin libopenmpi-dev or libhdf5-dev hdf5-tools. The Test (NVHPC) step runs --test-all without --no-mpi, so MPI invocations will fail at runtime. This was noted in the prior review and remains unresolved.

No other new high-confidence findings.

sbryngelson and others added 7 commits March 17, 2026 19:25
Test 5 NVHPC versions (23.11, 24.5, 24.11, 25.1, 25.3) on GitHub-hosted
runners using official NVIDIA container images. Each version runs both a
CPU build+test and a GPU (OpenACC) build-only target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…efaults

GitHub Actions include entries with new-only keys get merged into all
existing combos instead of creating new ones. Adding nvhpc and target
as main matrix axes with empty defaults means the include entries
overwrite original values, forcing creation of standalone combos.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 18, 2026

Claude Code Review

Incremental review from: 23a0bd9
Head SHA: 5ef74e3

New findings since last Claude review:

  • toolchain/mfc/test/coverage.py: definitions.py removed from ALWAYS_RUN_ALL creates a coverage gap for existing-parameter mutations. The comment justifies the removal as safe because "adding a parameter doesn't affect tests that don't use it." However, modifying an existing parameter's default value in definitions.py alone (without touching any .fpp file) would not trigger any coverage-based tests and would not hit the should_run_all_tests path, potentially allowing a regression to slip through CI undetected. The previous membership in ALWAYS_RUN_ALL was the guard for this case.

  • toolchain/mfc/test/coverage.py: load_coverage_cache now sets current_hash = "" on OSError instead of returning None. Previously, if cases.py was unreadable, the function rejected the cache entirely and fell back to a full test run. Now it assigns an empty string and falls through — the hash mismatch warning fires, but the stale cache is still returned and used for test pruning. If cases.py is missing or inaccessible (e.g., after a partial checkout), tests that should run may be pruned. The prior return None behaviour was safer for this edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant